library(datasets)
First lets bring in the Pokemon dataset.
Pokemon = read.csv(file = 'data/Pokemon.csv')
head(Pokemon)
## X. Name Type.1 Type.2 Total HP Attack Defense Sp..Atk
## 1 1 Bulbasaur Grass Poison 318 45 49 49 65
## 2 2 Ivysaur Grass Poison 405 60 62 63 80
## 3 3 Venusaur Grass Poison 525 80 82 83 100
## 4 3 VenusaurMega Venusaur Grass Poison 625 80 100 123 122
## 5 4 Charmander Fire 309 39 52 43 60
## 6 5 Charmeleon Fire 405 58 64 58 80
## Sp..Def Speed Generation Legendary
## 1 65 45 1 False
## 2 80 60 1 False
## 3 100 80 1 False
## 4 120 80 1 False
## 5 50 65 1 False
## 6 65 80 1 False
What is the most common type for pokemon?
Since there are secondary typing for Pokemon, we will have to combine the observations from the two variables.
But first lets see what the data shows for Pokemon that only have one type.
# this is the Type.2 for charmander
# a Pokemon that has a singular "Fire" type
Pokemon$Type.2[5]
## [1] ""
The data displays a “” or empty for a Pokemon that has to Type.2. This will muddy our data if we’re looking for the most common type, since technically “” is a valid observation as far as the code is concerned.
These are the tables for Type.1 and Type.2 variables.
table(Pokemon$Type.1)
##
## Bug Dark Dragon Electric Fairy Fighting Fire Flying
## 69 31 32 44 17 27 52 4
## Ghost Grass Ground Ice Normal Poison Psychic Rock
## 32 70 32 24 98 28 57 44
## Steel Water
## 27 112
table(Pokemon$Type.2)
##
## Bug Dark Dragon Electric Fairy Fighting Fire
## 386 3 20 18 6 23 26 12
## Flying Ghost Grass Ground Ice Normal Poison Psychic
## 97 14 25 35 14 4 34 33
## Rock Steel Water
## 14 22 14
# how many types are in Type.1
length(table(Pokemon$Type.1))
## [1] 18
# how many types are in Type.2
length(table(Pokemon$Type.2))
## [1] 19
As you can see there is an extra type in
table(Pokemon$Type.2) that is “” with the number 386 to
represent all of the observations/Pokemon with “” for Type.2.
This is the combined table of Type.1 and Type.2, minus the empty
“” type Now we need to be careful, because this does not represent every
Pokemon. As you can see when we look at the sum of the table, it shows
1214 when we only have 800 observations/Pokemon.
combinedTypeTable = (table(Pokemon$Type.1) + table(Pokemon$Type.2)[-1])
paste("The sum is: ", sum(combinedTypeTable))
## [1] "The sum is: 1214"
Finally lets see which type is the most common for pokemon, either as their Type.1 OR their Type.2.
# finds the max(table), and then finds the index of the max value, then returns both the number and its name, in our case the type
combinedTypeTable[which(combinedTypeTable == max(combinedTypeTable))]
## Water
## 126
As you can see, the most common typing for a Pokemon is Water.
But as always a visualization will help in contextualizing this piece of information.
combinedTypeTable = sort(combinedTypeTable, decreasing=TRUE)
barplot(combinedTypeTable[1:5], main = "TOP 5 MOST COMMON TYPES IN POKEMON", ylab="Frequency", col="lightblue")
What generation has the greatest number of each “type”(Type.1 and Type.2 combined) of Pokemon
First we need to know how many different generations are present in our data.
table(Pokemon$Generation)
##
## 1 2 3 4 5 6
## 166 106 160 121 165 82
There are 6 different generations of Pokemon listed in the data, with a varying number of Pokemon for each generation.
Now we need to parse the data so that we can see how many of each type is in each generation.
# table for the number of each Type.1 type for all entries where the Generation is 1
table(Pokemon$Type.1[which(Pokemon$Generation == 1)])
##
## Bug Dragon Electric Fairy Fighting Fire Ghost Grass
## 14 3 9 2 7 14 4 13
## Ground Ice Normal Poison Psychic Rock Water
## 8 2 24 14 11 10 31
# table for the number of each Type.2 type for all entries where the Generation is 1
table(Pokemon$Type.2[which(Pokemon$Generation == 1)])
##
## Dark Dragon Fairy Fighting Flying Grass Ground
## 88 1 1 3 2 23 2 6
## Ice Poison Psychic Rock Steel Water
## 3 22 7 2 2 4
length(table(Pokemon$Type.1[which(Pokemon$Generation == 1)]))
## [1] 15
length(table(Pokemon$Type.2[which(Pokemon$Generation == 1)]))
## [1] 14
But we run in to a problem due to the way that Pokemon types are set up. A Pokemon has one or two types, and the Type.2 is valued the same as Type.1. This means that we need to count both when looking at how many types of each Pokemon are in each generation. For example the first Pokemon in the data, Bulbasaur, would count as both a “Grass” and “Poison” type, with both holding equal weight. This causes discrepancies like we see above, where some types are missing from the Type.2 table and the Type.1 table . We will have to modify the table ourselves to make this work.
Since we will most likely have to do this for every generation, lets make a function that fills in the missing “types” and returns one vector with the combined values for both Type.1 and Type.2
# this function will help us combine the two Type tables int o one table with all unique types from both Type.1 and Type.2, as well as add up duplicates
combineTypeTblByGen = function(generation){
# merging the table of Type.1 and Type.2 for the generation
mergedTypeTable = merge(table(Pokemon$Type.1[which(Pokemon$Generation == generation)]), table(Pokemon$Type.2[which(Pokemon$Generation == generation)])[-1], all= TRUE)
# vector version of the merged tables above
typeVec = mergedTypeTable[[2]]
names(typeVec) = mergedTypeTable[[1]]
# for loop that adds up the repeat values for the types, and then removes the duplicate name+value from the vector
len = length(typeVec) - 1
for(i in 1:len){
if(i >= length(typeVec)){break}
else if(names(typeVec[i]) == names(typeVec[i + 1])){
typeVec[i+1] = typeVec[i] + typeVec[i+1]
#print(typeVec[i])
typeVec = typeVec[-c(i)]
}
}
return(typeVec)
}
barplot(combineTypeTblByGen(1), las=2, main="Frequency of Types for Generation 1")
The plot above combines frequency of “types” in both Type1 and Type2 for generation 1, and because we have the function combineTypeTblByGen() we are able to replicate this for every other generation as well.
Since we have the combined tables for every generation, we can move on to comparing the generations by Type.
This next function will allow us to see how many of one specific type is in every generation. For example it will allow us to see the number of “Bug” types there are in generation 1 through 6.
# function takes in a character argument that represents a Pokemon Type and returns a vector of length 6, with each index representing the number of that type of pokemon are in that generation number
compareTypeAcrossGen = function(type){
oneTypeAllGensVec = c()
for(i in 1:6){
#print(combineTypeTblByGen(i)[type])
oneTypeAllGensVec[i] = combineTypeTblByGen(i)[type]
}
names(oneTypeAllGensVec) = c("Gen 1", "Gen 2", "Gen 3", "Gen 4", "Gen 5", "Gen 6")
return(oneTypeAllGensVec)
}
compareTypeAcrossGen("Bug")
## Gen 1 Gen 2 Gen 3 Gen 4 Gen 5 Gen 6
## 14 12 14 11 18 3
compareTypeAcrossGen("Dark")
## Gen 1 Gen 2 Gen 3 Gen 4 Gen 5 Gen 6
## 1 8 13 7 16 3
barplot(compareTypeAcrossGen("Bug"), main="Frequency of Bug Types in Each Generation(Gen)", las=2, col = "lightgreen")
We have the comparisons across generations for individual types, but that doesn’t accomplish anything unless we want to create 18 different barplots. Instead lets use the plotly package to create an interactive barplot that shows the frequency of each type across the generations to get a broader understanding of the dataset.
q2FinalGraph
head(Pokemon)
## X. Name Type.1 Type.2 Total HP Attack Defense Sp..Atk
## 1 1 Bulbasaur Grass Poison 318 45 49 49 65
## 2 2 Ivysaur Grass Poison 405 60 62 63 80
## 3 3 Venusaur Grass Poison 525 80 82 83 100
## 4 3 VenusaurMega Venusaur Grass Poison 625 80 100 123 122
## 5 4 Charmander Fire 309 39 52 43 60
## 6 5 Charmeleon Fire 405 58 64 58 80
## Sp..Def Speed Generation Legendary
## 1 65 45 1 False
## 2 80 60 1 False
## 3 100 80 1 False
## 4 120 80 1 False
## 5 50 65 1 False
## 6 65 80 1 False